Picture for Xuanjing Huang

Xuanjing Huang

AdaptR1: Reinforcement Learning Based Adaptive Interleaved Thinking in Multi-hop Question Answering

Add code
May 29, 2026
Viaarxiv icon

Prompt-Level Reward Specifications for Open-Ended Post-Training

Add code
May 28, 2026
Viaarxiv icon

Rethinking Agentic RAG: Toward LLM-Driven Logical Retrieval Beyond Embeddings

Add code
May 26, 2026
Viaarxiv icon

LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening

Add code
May 19, 2026
Viaarxiv icon

Entropy Polarity in Reinforcement Fine-Tuning: Direction, Asymmetry, and Control

Add code
May 14, 2026
Viaarxiv icon

ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents

Add code
May 12, 2026
Viaarxiv icon

World Action Models: The Next Frontier in Embodied AI

Add code
May 12, 2026
Viaarxiv icon

Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses

Add code
Apr 28, 2026
Viaarxiv icon

Beyond Rating: A Comprehensive Evaluation and Benchmark for AI Reviews

Add code
Apr 22, 2026
Viaarxiv icon

EVPO: Explained Variance Policy Optimization for Adaptive Critic Utilization in LLM Post-Training

Add code
Apr 21, 2026
Viaarxiv icon